Approval Sheet

نویسنده

  • Kanishka Bhaduri
چکیده

Title of Dissertation: Efficient Local Algorithms for Distributed Data Mining in Large Scale Peer to Peer Environments: A Deterministic Approach Kanishka Bhaduri, Doctor of Philosophy, 2008 Thesis directed by: Dr. Hillol Kargupta Associate Professor Department of Computer Science and Electrical Engineering Peer-to-peer (P2P) systems such as Gnutella, Napster, e-Mule, Kazaa, and Freenet are increasingly becoming popular for many applications that go beyond downloading music files without paying for it. Examples include P2P systems for network storage, web caching, searching and indexing of relevant documents and distributed network-threat analysis. These environments are rich in data and this data, if mined, can provide valuable source of information. Mining the web cache of users, for example, may often give information about their browsing patterns leading to efficient searching, resource utilization, query routing and more. However, most of the off-the-shelf data analysis techniques are designed for centralized applications where the entire data is stored in a single location. These techniques do not work in a highly decentralized, distributed environment such as a P2P network. We need distributed data mining algorithms that are fundamentally local, scalable, decentralized, asynchronous and anytime to solve this problem. This research proposes DeFraLC: a Determinsitic Framework for Local Computation of functions defined on data distributed in large scale (peer to peer) systems. Computing global data models in such environments can be very expensive. Moving all or some of the data to a central location does not work because of the high cost involved in centralization. The cost increases even more under a dynamic scenario where the peers’ data and the network topology change arbitrarily. In this dissertation we have focused on developing algorithms for deterministic function-computation in large scale P2P environments. Our algorithmic framework is local which means that a peer can compute a function based on the information of only a handful of nearby neighbors and the communication overhead of the algorithm is upper bounded by some constant, independent of the size of the system. As a consequence, several messages can be pruned, leading to excellent scalability of our algorithms. The first algorithm that we have developed — PeGMA, Peer-to-Peer Generic Monitoring Algorithm — is capable of computing complex functions defined on the average of the horizontally distributed data. This generic algorithm is extremely accurate, highly scalable and can seamlessly adapt to changes in the data or the network. Following PeGMA, several interesting algorithms can be developed such as the L2 norm monitoring of distributed data which is a very powerful primitive. Using a two step feedback loop, a number of data mining algorithms have been proposed. The first step uses the local algorithm to raise a flag whenever the current data does not fit the function. The second step uses a feedback loop to sample data from the network and build a new function. The correctness of the local algorithm guarantees that once the computation terminates each peer has the same result compared to a centralized scenario. We propose solutions for P2P k-means monitoring, eigen monitoring and multivariate regression in P2P environments. Furthermore, we have shown how a complex data mining algorithm such as decision tree induction can be developed for P2P environments. Finally we have implemented all of the algorithms in a Distributed Data Mining Toolkit (DDMT) [44] developed at the DIADIC research lab at UMBC. Our extensive experimental results show that the proposed algorithms are accurate, efficient and highly scalable. EFFICIENT LOCAL ALGORITHMS FOR DISTRIBUTED DATA MINING IN LARGE SCALE PEER TO PEER ENVIRONMENTS: A DETERMINISTIC APPROACH

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Georgia Institute of Technology Office of Contract Administration Notice of Project Closeout

Defense Priority Rating: Military Security Classification: Unclassified (or) Company/Industrial-Proprietary: _ RESTRICTIONS See Attached Gov' t Supplemental Information Sheet for Additional Requirements. Travel: Foreign travel must have prior approval — Contact OCA in each caie. Doniestic traverreouires sponsor approval where total will exceed greater of $500 or 125% of approved proposal budget...

متن کامل

Integrated Safeguards Data Sheet (Initial)

Authorized to Appraise Date: October 6, 2003 IBRD Amount ($m): Bank Approval: February 27, 2004 IDA Amount ($m): Global Supplemental Amount ($m): 5.50 Managing Unit: AFTES Lending Instrument: Specific Investment Loan (SIL) Status: Lending Sector: General agriculture, fishing and forestry sector Theme: Biodiversity (P); Other environment and natural resources management (P); Environmental polici...

متن کامل

Articular cartilage regeneration using cell sheet technology.

Cartilage damage is typically treated by chondrocyte transplantation, mosaicplasty, or microfracture. Recent advances in tissue engineering have prompted research on techniques to repair articular cartilage damage using a variety of transplanted cells. We studied the repair and regeneration of cartilage damage using layered chondrocyte sheets prepared in a temperature-responsive culture dish. W...

متن کامل

Design of an Iris Verification System on Embedded Blackfin Processor for Access Control Application Richard Ng Yew Fatt Master of Engineering Science Faculty of Engineering and Science Universiti

ii ACKNOWLEDGEMENTS iii APPROVAL SHEET iv SUBMISSION SHEET v DECLARATION vi LIST OF TABLES x LIST OF FIGURES xi LIST OF ABBREVIATIONS xiii CHAPTER 1.0 INTRODUCTION 1 1.1 Background 1 1.2 Motivation 2 1.3 Scope of Work 3 1.4 Objective 4 1.5 Thesis Outline 5 2.0 LITERATURE REVIEW 7 2.1 Image Preprocessing 7 2.1.1 Iris Localization 8 2.1.1.1 Integro-differential operator 8 2.1.1.2 Hough Transform ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010